Bulletin of Electrical Engineering and Informatics 
Vol. 11, No. 2, April 2022, pp. 1110~1116 
ISSN: 2302-9285, DOI: 10.1159 1/eei.v1 112.3344 øO 1110 


Storage and encryption file authentication for cloud-based data 
retrieval 


Mustafa Qahtan Alsudani!, Hassan Falah Fakhruldeen!?, Heba Abdul-Jaleel Al-Asady”, Feryal 


Ibrahim Jabbar‘ 
‘Department of Computer Techniques Engineering, Faculty of Information Technology, Imam Ja’afar Al-Sadiq University, Baghdad, 
Traq 
*Department of Electrical Engineering, College of Engineering, University of Kufa, Kufa, Iraq 
Department of Computer Technical Engineering, College of Technical Engineering, The Islamic University, Najaf, Iraq 
‘Department of Information Technology (IT), Al-Mustaqbal University College, Babylon, Iraq 


Article Info ABSTRACT 

Article history: The amount of data that must be processed, stored, and modified rises as time 
. passes. An enormous volume of data from a wide range of sources must be 

Received Nov 1, 2021 stored on a safe platform. Maintaining such a large volume of data on a single 

Revised Feb 13, 2022 computer or hard drive is impracticable. As a result, the cloud is the ideal 

Accepted Feb 25, 2022 platform for storing any quantity of data. An advantage of storing data in the 


cloud is that it may be accessed at any time and from any device. However, 
the security of data stored in the cloud is a big concern. Because of this, despite 
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data should be encrypted before sending it off to the cloud service provider to 
avoid this issue. It's a great way to increase the security of your papers. 
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1. INTRODUCTION 

The cloud is an expression that refers to reaching computers, information technology (IT) and 
software apps through a network link, often via wide area networking (WAN) or internet communication WAN 
data center access [1]. Another benefit of cloud computing; provisioning the service is also easier. You can 
utilize it quickly in many situations; rather than being constrained by physical geography, remote users can 
access cloud services from anywhere they have a connection. It is also classified into three parts, which apply 
to access to services or infrastructure: private, public, and hybrid [2]. Anyone who wants to purchase or rent 
services may be offered public cloud services. Private cloud services were created by businesses to be utilized 
by their staff and partners only by integrating the two-hybrid cloud services [3]. 

A service that may be readily delivered through a network link, commonly using the web or mobile 
applications, has become known as cloud computing [4]. Consider cloud web hosting services (Amazon or 
Rackspace), digital content services (Apple iTunes, Amazon, and Netflix), cloud storage services such as 
dropbox or google drive, email services (Gmail), or even contracts for housing or transportation services 
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(Airbnb or Uber) [5]. Business programs such as microsoft outlook, typically used on local networks or devices, 
move to cloud-based apps [2], [3]. Four entities are involved in this scheme data owner, administration server, 
cloud server, and data usage. The data owner wants to upload sensitive data files into the cloud [6]. Due to 
privacy concerns, files must be encrypted before uploading them into the cloud [4]. With the approval from 
the administration server, the file is to be split and encrypted, with each block utilizing a standard symmetric 
encryption algorithm [7]. Before encrypting the files, keywords are extracted from the file and encrypted, and 
forwarded to store in the cloud server. Extracted keywords are encrypted homomorphically [8]. These 
encrypted keyword indexes made the searching operation easier. When the data user needs to retrieve the files 
from the cloud of his interest, firstly user sends a request to the administration server. The request is in the 
form of multiple keywords given to the administration server [5], [6]. The data can be encrypted utilizing any 
standard symmetric encryption algorithm like advanced encryption standard (AES) or data encryption standard 
(DES) [9]. In this manuscript, the AES encryption algorithm is utilized to encrypt the data files by the data 
owners. By doing so, searching over EncDta files is a tedious operation [10]-[15]. Upon receiving the request 
from the data, the user administration server authenticates the user. The administration server encrypts the 
received keywords homomorphically and sends them to the cloud server if the user is authorized. Otherwise, 
discard the request [16]. The searching operation to retrieve the files is done by the cloud server. If the server 
found a match between the encrypted keywords stored with the requested keywords, the server obtains the file 
id, retrieves it in a combined form, and forwards it to the requested data user. But the received file is in an 
encrypted form [17]. To access the file in the readable form only through the secret key utilized by the data 
owner to encrypt it. So, the user sends a request to the data owner to access the key. If the user receives the key 
from the owner, they can download the file and utilize it [7]. Storing and retrieving a considerable amount of 
data in the cloud, which can be accessed anywhere, at any time, and from any device by data owners or others, 
leads to poor security, and privacy about that data. So, the research problem is data security in the cloud to 
increase the privacy of data in the cloud, which will be stored in encrypted form. Encrypted information 
(encrypted information) may be calculated without first decrypting it using homomorphic encryption (HE). 
Upon decryption, the measurement result is encrypted, just as it would be if the operations were performed on 
the unencrypted results, and the results are identical. Homomorphic structures of encryption are fundamentally 
malleable [18]. HE schemes have poorer authentication properties in terms of malleability than non- 
homomorphic schemes. 

Fully homomorphic encryption (FHE) makes it possible to test arbitrary unbounded depth circuits and 
is the best notion of homomorphic encryption [8], [18]. The multiplicative depth of circuits is the most 
functional constraint in conducting computations over EncDta for most homomorphic encryption systems. In 
1978, within a year of the Rivest-Shamir—Adleman (RSA) scheme being written, the problem of building an 
FHE scheme was first suggested. If a compromise remained was unknown for more than 30 years. Partial 
outcomes over time included the following schemes [19]-[23]: 1) RSA cryptosystem (unlimited modular 
multiplication count), ii) cryptosystem ElGamal (unbounded number of modular multiplications), 
iii) cryptosystem Goldwasser-Micali (unlimited number of exclusives or activities), iv) Cryptosystem BHE 
(unlimited number of modular additions), v) Paillier cryptosystem (number of modular adds without limits), 
vi) Sander-Young- Yung device (logarithmic depth circuits were overcome after more than 20 years), vii) the 
cryptosystem of Boneh-Goh-Nissim (unlimited number of addition operations but at most one multiplication), 
viii) Ishai-Paskin cryptosystem (branching programs of polynomial-size) [10], [21]. A searchable protected 
index must be generated for all documents submitted to the cloud to search documents efficiently. Before 
splitting the files into blocks, the administration server (AS) extracts keywords from the file utilizing a rapid 
keyword extraction algorithm [24]-[26]. 


2. METHOD 

Cloud is the most promising medium to store and retrieve a large amount of data, which can be accessed 
anywhere and at any time and from any device. Here, it benefits this property of the cloud, while the security of the 
data in the cloud is the problem with this. To increase the privacy of the data in the cloud, which is to be stored in an 
encrypted form. The previous work performed the searching operation over un EncDta, a simple task while the 
security becomes overruled. To overcome the problem with searching over EncDta, here homomorphic encryption 
algorithm (HEA) is utilized for keywords encrypting to be stored and to be searched. HEA is utilized because of the 
reason that it helps to implement operations over EncDta without being decrypted. Hence, it improves the security 
of the files stored in the cloud and makes searching easier. The proposed system consists of four entities: data owner, 
AS, cloud server, and data usage. The data owner is the person who uploads files to the cloud. The administration 
server is an intermediate server between the owner/user and the cloud server. Uploading and requesting files are only 
through the administration server. The data user-made request to access and update the files to the cloud server. 
Cloud servers ultimately store the files, perform the searching operation, and provide the requested files to the data 
user. Figure 1 demonstrates the architecture of the proposed system. 
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Figure 1. The system architecture 


2.1. Processing unit side 

Four separate organizations are involved in cloud storage: cloud server, system owner, service 
management, and user data. The data owner needs to register with the cloud-first. After that, the client needs 
to wait before the administration server accepts him. The cloud server hosts third-party data collection and 
retrieval facilities. The owner of the details should submit the records. Data must be encrypted until outsourcing 
since the submitted data can contain confidential data. The owner of the data transfers the file to AS. The file 
should be split into several blocks after obtaining permission from the AS. The block should be encrypted 
utilizing the standard symmetric encryption algorithm AES to forward the encrypted blocks to the cloud server. 


2.2. Keyword extraction side 
In this algorithm, the extraction process is done by four phases; preprocessing, word Co-occurrence 
graph, calculate word score, and keyword extraction. 


2.2.1. Preprocessing 

During the preprocessing step, the document text is partitioned into candidate keywords utilizing stop 
words and term delimiters. In a text, candidate keywords are word sequences that form content. A list of 
keywords for the candidate is generated by parsing the text. These delimiters are used to create an array of 
words from a single text string. At this point, a list is broken down into sequences of contiguous words and 
stop word locations (term delimiters). Whenever a group of words appears in a text simultaneously, they're 
regarded as potential keywords. 
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2.2.2. Word Co-occurrence graph 

After the preprocessing stage, the word co-occurrence graph is drawn utilizing the candidate keywords 
from the previous stage. The X and Y-axis represent the candidate keywords. From the graph, the frequency 
of each word and how it is related to other words are identified. 


2.2.3. Calculate word score 

From the word co-occurrence graph, frequency and the degree of each word are obtained. The 
frequency of the word is the number of times that particular keyword occurs in the document. For example, 
word degree is the number of times a word appears within a lengthy list of possible search terms. Each potential 
keyword's score is calculated by dividing the degree by the frequency of the terms. By this method, the score 
value of all the candidate keywords is calculated. If the candidate keyword is extended, its score value is the 
sum of its member word scores. 


2.2.4. Keyword extraction 
Among the score value of all the candidate keywords, one-third of the words in the graph having the 
highest score is ultimately taken as the exact keywords utilized to create the searchable index. 


2.3. Encryption side 

The documents are encrypted using some encryption method before uploading into the global space, 
while homomorphic encryption is utilized for the stable index. The AES symmetric encryption algorithm is 
utilized to encrypt the document to be uploaded. There are two basic operations for specifying the encrypted 
text and the plaintext domain that homomorphic encryption performs. BHE is working over pure integers. It 
utilizes here the additive and multiplicative property of homomorphic encryption. Thus, BHE is applied to the 
keyword index. By utilizing BHE, computations can be performed over Encarta. Hence the keywords that need 
to be encrypted are firstly converted into integers. BHE is done is being as: 
a. generating the key: select the most significant two primes p, q compute n=pq, n^2, and choose random 

integer g, 


g =(1+7) 0) 


Carmichael's Totient function conjecture 


A= Iem(p —1,q -1) (2) 
omn) = (p-1,q-1) (3) 
= p(n) mod n? (4) 


Public key: (n, g) and private key: (p, q, à) 
b. encrypting: plaintext m, m<n and find a random r. 


Ciphertext c = g™.r™mod n? (5) 


2.4. Searching operation 

The cloud server does the search operation. Homomorphically encrypted keywords are stored in the 
cloud server when the data user requests the administration server by multiple keywords. After authentication, 
the authentication server encrypts the requested keywords homomorphically, and forwards them to the cloud 
server. In the cloud server, both the stored and requested keywords are in the integer form by applying BHE. 
Then the searching can be performed by subtracting both integer values, i.e., stored keyword value and 
requested keyword value [12], [19]. Search operation can be done between the ciphers is being as: 


Enla,r]En[b,r] 1-1 mod n? 


n 


Difference, d = 


mod n (6) 


If the subtraction result is zero, the ciphers are identical; otherwise, they are not. Through this method, 
exact keywords can be found. From the matched keywords, the corresponding file id can be obtained. If the 
matching file is found, combine the split blocks of that particular file and send it to the data user. The 
corresponding encrypted files are sent to the requested data users by the cloud server. At the same time, the 
secret key utilized for encrypting the file is required by the data used to download the contents of the file. For 
that purpose, the data user sends a request forwarded to both AS and the data owner to get permission to 
download the received file. The file can be downloaded only after getting permission from the owner and the 
AS. The data owner provides permission in the form of the secret key and AS again cross-checks the 
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timestamps and the requested nature of the data user. If it was valid, then approve to download the file. The 
downloaded file can be updated by the data user with the permission of the corresponding data owner. The user 
sends an edit request to the corresponding data owner. The owner can view changes made by the user. If he 
approves, the user can make changes to the file. If the owner disagrees with the changes in the file's contents, 
then the owner can block the user from further operation to be done. Hence, the user no longer remains a data 
user in that cloud environment. 


3. RESULTS AND DISCUSSION 

The proposed system brings much better performance than the existing systems, which perform the 
search operation utilizing cosine similarity (MKSCS). In MKSCS, indexed keywords are obtained by utilizing 
the porter stemmer method and encrypted utilizing any standard encryption algorithm during outsourcing to 
the cloud. The matching operation is performed over unencrypted keywords instead of encrypted. The 
encrypted keywords are decrypted, perform the matching operation, and find suitable files on searching. The 
performance of the proposed system is evaluated as compared with the existing MKSCS. Co-occurrence 
statistical information-based keyword extraction is used for each metric's outcomes. Figures 2 concerning the 
number of keyword comparisons clearly show that the predictive performance typically improves as the 
number of keywords maintained in the dataset increases. Figures 2(a) and (b), candidate keyword no. vs. real 
keywords. Figure 3. no. of real keywords vs. significance. Figure 4. no of requests keywords vs. time for 
operation matching. The result of the performance evaluation is shown below: 
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Figure 2. Candidate keyword no. vs. real keywords (a) main effect plot for accuracy and (b) main effect plot 
for time matching (s) 
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Figure 3. No. of real keywords vs. significance 
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Figure 4. No of requests keywords vs. time for operation matching 


4. CONCLUSION 

Connection to global storage space over the internet is becoming popular every day. Security is the 
primary issue of storing data in a trusted third party. Deploying a safe multi-word search over encrypted cloud 
data is a big issue since the data is encrypted before it's sent out. Data owners will be encouraged to save their 
EncDta files in a global storage system, where they will be searchable and retrievable by many authorized 
users. Computing can be achieved without decryption by using HEA. As the server does not know the exact 
data to be processed by the data owner and requested by the data user, it helps to enhance data protection. 
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